Goto

Collaborating Authors

 unsupervised outlier detection


GradStop: Exploring Training Dynamics in Unsupervised Outlier Detection through Gradient Cohesion

Zhang, Yuang, Wang, Liping, Huang, Yihong, Zheng, Yuanxing

arXiv.org Artificial Intelligence

Unsupervised Outlier Detection (UOD) is a critical task in data mining and machine learning, aiming to identify instances that significantly deviate from the majority. Without any label, deep UOD methods struggle with the misalignment between the model's direct optimization goal and the final performance goal of Outlier Detection (OD) task. Through the perspective of training dynamics, this paper proposes an early stopping algorithm to optimize the training of deep UOD models, ensuring they perform optimally in OD rather than overfitting the entire contaminated dataset. Inspired by UOD mechanism and inlier priority phenomenon, where intuitively models fit inliers more quickly than outliers, we propose GradStop, a sampling-based label-free algorithm to estimate model's real-time performance during training. First, a sampling method generates two sets: one likely containing more outliers and the other more inliers, then a metric based on gradient cohesion is applied to probe into current training dynamics, which reflects model's performance on OD task. Experimental results on 4 deep UOD algorithms and 47 real-world datasets and theoretical proofs demonstrate the effectiveness of our proposed early stopping algorithm in enhancing the performance of deep UOD models. Auto Encoder (AE) enhanced by GradStop achieves better performance than itself, other SOTA UOD methods, and even ensemble AEs. Our method provides a robust and effective solution to the problem of performance degradation during training, enabling deep UOD models to achieve better potential in anomaly detection tasks.


Pragmatic Machine Learning with Python: Learn How to Deploy Machine Learning Models in Production (English Edition): Nag, Avishek: 9789389845365: Amazon.com: Books

#artificialintelligence

Get familiar with practical concepts of Machine Learning from ground zero Learn how to deploy Machine Learning models in production Understand how to do "Data Science Storytelling" Explore the latest topics in the current industry about Machine Learning Understand how to do "Data Science Storytelling"


Meta-Learning for Unsupervised Outlier Detection with Optimal Transport

Singh, Prabhant, Vanschoren, Joaquin

arXiv.org Artificial Intelligence

Automated machine learning has been widely researched and adopted in the field of supervised classification and regression, but progress in unsupervised settings has been limited. We propose a novel approach to automate outlier detection based on meta-learning from previous datasets with outliers. Our premise is that the selection of the optimal outlier detection technique depends on the inherent properties of the data distribution. We leverage optimal transport in particular, to find the dataset with the most similar underlying distribution, and then apply the outlier detection techniques that proved to work best for that data distribution. We evaluate the robustness of our approach and find that it outperforms the state of the art methods in unsupervised outlier detection. This approach can also be easily generalized to automate other unsupervised settings.


A geometric framework for outlier detection in high-dimensional data

Herrmann, Moritz, Pfisterer, Florian, Scheipl, Fabian

arXiv.org Artificial Intelligence

Outlier or anomaly detection is an important task in data analysis. We discuss the problem from a geometrical perspective and provide a framework that exploits the metric structure of a data set. Our approach rests on the manifold assumption, i.e., that the observed, nominally high-dimensional data lie on a much lower dimensional manifold and that this intrinsic structure can be inferred with manifold learning methods. We show that exploiting this structure significantly improves the detection of outlying observations in high-dimensional data. We also suggest a novel, mathematically precise, and widely applicable distinction between distributional and structural outliers based on the geometry and topology of the data manifold that clarifies conceptual ambiguities prevalent throughout the literature. Our experiments focus on functional data as one class of structured high-dimensional data, but the framework we propose is completely general and we include image and graph data applications. Our results show that the outlier structure of high-dimensional and non-tabular data can be detected and visualized using manifold learning methods and quantified using standard outlier scoring methods applied to the manifold embedding vectors.


Propositionalization for Unsupervised Outlier Detection in Multi-Relational Data

Riahi, Fatemeh (Simon Fraser University) | Schulte, Oliver (Simon Fraser University)

AAAI Conferences

We develop a novel propositionalization approach to unsupervised outlier detection for multi-relational data. Propositionalization summarizes the information from multi-relational data, that are typically stored in multiple tables, in a single data table. The columns in the data table represent conjunctive relational features that are learned from the data. An advantage of propositionalization is that it facilitates applying the many previous outlier detection methods that were designed for single-table data. We show that conjunctive features for outlier detection can be learned from data using statistical-relational methods. Specifically, we apply Markov Logic Network structure learning. Compared to baseline propositionalization methods, Markov Logic propositionalization produces the most compact data tables, whose attributes capture the most complex multi-relational correlations. We apply three representative outlier detection methods LOF, KNN, OutRank to the data tables constructed by propositionalization.